Expressing Arbitrary Reward Functions as Potential-Based Advice

نویسندگان

Anna Harutyunyan

Sam Devlin

Peter Vrancx

Ann Nowé

چکیده

Effectively incorporating external advice is an important problem in reinforcement learning, especially as it moves into the real world. Potential-based reward shaping is a way to provide the agent with a specific form of additional reward, with the guarantee of policy invariance. In this work we give a novel way to incorporate an arbitrary reward function with the same guarantee, by implicitly translating it into the specific form of dynamic advice potentials, which are maintained as an auxiliary value function learnt at the same time. We show that advice provided in this way captures the input reward function in expectation, and demonstrate its efficacy empirically.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Principled Methods for Advising Reinforcement Learning Agents

An important issue in reinforcement learning is how to incorporate expert knowledge in a principled manner, especially as we scale up to real-world tasks. In this paper, we present a method for incorporating arbitrary advice into the reward structure of a reinforcement learning agent without altering the optimal policy. This method extends the potentialbased shaping method proposed by Ng et al....

متن کامل

COVARIANCE MATRIX OF MULTIVARIATE REWARD PROCESSES WITH NONLINEAR REWARD FUNCTIONS

Multivariate reward processes with reward functions of constant rates, defined on a semi-Markov process, first were studied by Masuda and Sumita, 1991. Reward processes with nonlinear reward functions were introduced in Soltani, 1996. In this work we study a multivariate process , , where are reward processes with nonlinear reward functions respectively. The Laplace transform of the covar...

متن کامل

Asymptotic Behavior of Multivariate Reward Processes with Nonlinear Reward Functions

متن کامل

Advice Generation from Observed Execution: Abstract Markov Decision Process Learning

An advising agent, a coach, provides advice to other agents about how to act. In this paper we contribute an advice generation method using observations of agents acting in an environment. Given an abstract state definition and partially specified abstract actions, the algorithm extracts a Markov Chain, infers a Markov Decision Process, and then solves the MDP (given an arbitrary reward signal)...

متن کامل

Analytical Solution for Two-Dimensional Coupled Thermoelastodynamics in a Cylinder

An infinitely long hollow cylinder containing isotropic linear elastic material is considered under the effect of arbitrary boundary stress and thermal condition. The two-dimensional coupled thermoelastodynamic PDEs are specified based on equations of motion and energy equation, which are uncoupled using Nowacki potential functions. The Laplace integral transform and Bessel-Fourier series are u...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Expressing Arbitrary Reward Functions as Potential-Based Advice

نویسندگان

چکیده

منابع مشابه

Principled Methods for Advising Reinforcement Learning Agents

COVARIANCE MATRIX OF MULTIVARIATE REWARD PROCESSES WITH NONLINEAR REWARD FUNCTIONS

Asymptotic Behavior of Multivariate Reward Processes with Nonlinear Reward Functions

Advice Generation from Observed Execution: Abstract Markov Decision Process Learning

Analytical Solution for Two-Dimensional Coupled Thermoelastodynamics in a Cylinder

عنوان ژورنال:

اشتراک گذاری